Uncontrolled format string is a type of

code injection Code injection is a computer security exploit where a program fails to correctly process external data, such as user input, causing it to interpret the data as executable commands. An attacker using this method "injects" code into the program whi ...

vulnerability Vulnerability refers to "the quality or state of being exposed to the possibility of being attacked or harmed, either physically or emotionally." The understanding of social and environmental vulnerability, as a methodological approach, involves ...

discovered around 1989 that can be used in

security exploit An exploit is a method or piece of code that takes advantage of Vulnerability (computer security), vulnerabilities in software, Application software, applications, Computer network, networks, operating systems, or Computer hardware, hardware, typic ...

s. Originally thought harmless, format string exploits can be used to crash a program or to execute harmful code. The problem stems from the use of unchecked user input as the format string parameter in certain C functions that perform formatting, such as printf(). A malicious user may use the %s and %x format tokens, among others, to print data from the

call stack In computer science, a call stack is a Stack (abstract data type), stack data structure that stores information about the active subroutines and block (programming), inline blocks of a computer program. This type of stack is also known as an exe ...

or possibly other locations in memory. One may also write arbitrary data to arbitrary locations using the %n format token, which commands printf() and similar functions to write the number of bytes formatted to an address stored on the stack.

Details

A typical exploit uses a combination of these techniques to take control of the

instruction pointer The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...

(IP) of a process, for example by forcing a program to overwrite the address of a library function or the return address on the stack with a pointer to some malicious

shellcode In hacking, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised ma ...

. The padding parameters to format specifiers are used to control the number of bytes output and the %x token is used to pop bytes from the stack until the beginning of the format string itself is reached. The start of the format string is crafted to contain the address that the %n format token can then overwrite with the address of the malicious code to execute. This is a common vulnerability because format bugs were previously thought harmless and resulted in vulnerabilities in many common tools. MITRE's CVE project lists roughly 500 vulnerable programs as of June 2007, and a trend analysis ranks it the 9th most-reported vulnerability type between 2001 and 2006. Format string bugs most commonly appear when a programmer wishes to output a string containing user supplied data (either to a file, to a buffer, or to the user). The programmer may mistakenly write printf(buffer) instead of printf("%s", buffer). The first version interprets buffer as a format string, and parses any formatting instructions it may contain. The second version simply prints a string to the screen, as the programmer intended. Both versions behave identically in the absence of format specifiers in the string, which makes it easy for the mistake to go unnoticed by the developer. Format bugs arise because C's argument passing conventions are not

type-safe In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that i ...

. In particular, the varargs mechanism allows

function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-orie ...

s to accept any number of arguments (e.g. printf) by "popping" as many

argument An argument is a series of sentences, statements, or propositions some of which are called premises and one is the conclusion. The purpose of an argument is to give reasons for one's conclusion via justification, explanation, and/or persu ...

s off the

as they wish, trusting the early arguments to indicate how many additional arguments are to be popped, and of what types. Format string bugs can occur in other programming languages besides C, such as Perl, although they appear with less frequency and usually cannot be exploited to execute code of the attacker's choice.

History

Format bugs were first noted in 1989 by the

fuzz testing In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptio ...

work done at the University of Wisconsin, which discovered an "interaction effect" in the

C shell The C shell (csh or the improved version, tcsh) is a Unix shell created by Bill Joy while he was a graduate student at University of California, Berkeley in the late 1970s. It has been widely distributed, beginning with the 2BSD release of the ...

(csh) between its

command history Command history is a feature in many operating system shells, computer algebra programs, and other software that allows the user to recall, edit and rerun previous commands. Command line history was added to Unix in Bill Joy's C shell of 1978; ...

mechanism and an error routine that assumed safe string input. The use of format string bugs as an

attack vector In computer security, an attack vector is a specific path, method, or scenario that can be exploited to break into an IT system, thus compromising its security. The term was derived from the corresponding notion of vector in biology. An attack ve ...

was discovered in September 1999 by Tymm Twillman during a

security audit An information security audit is an audit of the level of information security in an organization. It is an independent review and examination of system records, activities, and related documents. These audits are intended to improve the level of i ...

of the ProFTPD daemon. The audit uncovered an snprintf that directly passed user-generated data without a format string. Extensive tests with contrived arguments to printf-style functions showed that use of this for privilege escalation was possible. This led to the first posting in September 1999 on the

Bugtraq Bugtraq was an electronic mailing list dedicated to issues about computer security. On-topic issues are new discussions about vulnerabilities, vendor security-related announcements, methods of exploitation, and how to fix them. It was a high-volume ...

mailing list regarding this class of vulnerabilities, including a basic exploit. It was still several months, however, before the security community became aware of the full dangers of format string vulnerabilities as exploits for other software using this method began to surface. The first exploits that brought the issue to common awareness (by providing remote root access via code execution) were published simultaneously on the

list in June 2000 by

Przemysław Frasunek Przemysław Frasunek (also known as venglin, born 6 May 1983) is a " white hat" hacker from Poland. He has been a frequent Bugtraq poster since late in the 1990s, noted for one of the first published successful software exploits for the forma ...

and a person using the nickname ''tf8''. They were shortly followed by an explanation, posted by a person using the nickname ''lamagra''. "Format bugs" was posted to the

list by Pascal Bouchareine in July 2000. The seminal paper "Format String Attacks" by Tim Newsham was published in September 2000 and other detailed technical explanation papers were published in September 2001 such as ''Exploiting Format String Vulnerabilities'', by team

Teso Teso or TESO may refer to: Places * Têso, a Portuguese hamlet * Teso District, Kenya, a defunct administrative district in the former Western Province of Kenya * Teso District, Uganda, a district in Uganda now known as Teso sub-region Language ...

Prevention in compilers

Many compilers can statically check format strings and produce warnings for dangerous or suspect formats. In the GNU Compiler Collection, the relevant compiler flags are, -Wall,-Wformat, -Wno-format-extra-args, -Wformat-security, -Wformat-nonliteral, and -Wformat=2. Most of these are only useful for detecting bad format strings that are known at compile-time. If the format string may come from the user or from a source external to the application, the application must validate the format string before using it. Care must also be taken if the application generates or selects format strings on the fly. If the GNU C library is used, the -D_FORTIFY_SOURCE=2 parameter can be used to detect certain types of attacks occurring at run-time. The -Wformat-nonliteral check is more stringent.

Detection

Contrary to many other security issues, the root cause of format string vulnerabilities is relatively easy to detect in x86-compiled executables: For printf-family functions, proper use implies a separate argument for the format string and the arguments to be formatted. Faulty uses of such functions can be spotted by simply counting the number of arguments passed to the function; an "argument deficiency" is then a strong indicator that the function was misused.

Detection in x86-compiled binaries

Counting the number of arguments is often made easy on x86 due to a calling convention where the caller removes the arguments that were pushed onto the stack by adding to the stack pointer after the call, so a simple examination of the stack correction yields the number of arguments passed to the printf-family function.'

References

External links

Introduction to format string exploits
2013-05-02, by Alex Reece * scut / team-

TESO Teso or TESO may refer to: Places * Têso, a Portuguese hamlet * Teso District, Kenya, a defunct administrative district in the former Western Province of Kenya * Teso District, Uganda, a district in Uganda now known as Teso sub-region Language ...

br>Exploiting Format String Vulnerabilities
v1.2 2001-09-09
WASC Threat Classification - Format String Attacks

CERT Secure Coding Standards

CERT Secure Coding Initiative

Known vulnerabilities
at MITRE's CVE project.
Secure Programming with GCC and GLibc
(2008), by Marcel Holtmann {{DEFAULTSORT:Format String Attack Computer security exploits